Automatic extraction of angiogenesis bioprocess from text

نویسندگان

  • Xinglong Wang
  • Iain McKendrick
  • Ian Barrett
  • Ian Dix
  • Tim French
  • Jun'ichi Tsujii
  • Sophia Ananiadou
چکیده

MOTIVATION Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find. RESULTS This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop. AVAILABILITY The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http://text0.mib.man.ac.uk/~mbassxw2/angiogenesis/ CONTACT [email protected].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2011